A Bayesian Approach for Learning and Planning in Partially Observable Markov Decision Processes
نویسندگان
چکیده
Bayesian learning methods have recently been shown to provide an elegant solution to the exploration-exploitation trade-off in reinforcement learning. However most investigations of Bayesian reinforcement learning to date focus on the standard Markov Decision Processes (MDPs). The primary focus of this paper is to extend these ideas to the case of partially observable domains, by introducing the Bayes-Adaptive Partially Observable Markov Decision Processes. This new framework can be used to simultaneously (1) learn a model of the POMDP domain through interaction with the environment, (2) track the state of the system under partial observability, and (3) plan (near-)optimal sequences of actions. An important contribution of this paper is to provide theoretical results showing how the model can be finitely approximated while preserving good learning performance. We present approximate algorithms for belief tracking and planning in this model, as well as empirical results that illustrate how the model estimate and agent’s return improve as a function of experience.
منابع مشابه
The Essential Benefits of the Pomdp Approach Can
T his article argues that future generations of computerbased systems will need cognitive user interfaces to achieve sufficiently robust and intelligent human interaction. These cognitive user interfaces will be characterized by the ability to support inference and reasoning, planning under uncertainty, short-term adaptation, and long-term learning from experience. An appropriate engineering fr...
متن کاملLearning discrete Bayesian models for autonomous agent navigation
Partially observable Markov decision processes (POMDPs) are a convenient representation for reasoning and planning in mobile robot applications. We investigate two algorithms for learning POMDPs from series of observation/action pairs by comparing their performance in fourteen synthetic worlds in conjunction with four planning algorithms. Experimental results suggest that the traditional Baum-W...
متن کاملDecision Making under Uncertainty: Operations Research Meets AI (Again)
Models for sequential decision making under uncertainty (e.g., Markov decision processes,or MDPs) have been studied in operations research for decades. The recent incorporation of ideas from many areas of AI, including planning, probabilistic modeling, machine learning, and knowledge representation) have made these models much more widely applicable. I briefly survey recent advances within AI i...
متن کاملLearning Others' Intentional Models in Multi-Agent Settings Using Interactive POMDPs
Interactive partially observable Markov decision processes (I-POMDPs) provide a principled framework for planning and acting in a partially observable, stochastic and multiagent environment, extending POMDPs to multi-agent settings by including models of other agents in the state space and forming a hierarchical belief structure. In order to predict other agents’ actions using I-POMDP, we propo...
متن کاملRepresenting hierarchical POMDPs as DBNs for multi-scale map learning
We explore the advantages of representing hierarchical partially observable Markov decision processes (H-POMDPs) as dynamic Bayesian networks (DBNs). We use this model for representing and learning multi-resolution spatial maps for indoor robot navigation. Our results show that a DBN representation of H-POMDPs can train significantly faster than the original learning algorithm for H-POMDPs or t...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Journal of Machine Learning Research
دوره 12 شماره
صفحات -
تاریخ انتشار 2011